Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Double complex convolution and attention aggregating recurrent network for speech enhancement
Bennian YU, Yongzhao ZHAN, Qirong MAO, Wenlong DONG, Honglin LIU
Journal of Computer Applications    2023, 43 (10): 3217-3224.   DOI: 10.11772/j.issn.1001-9081.2022101533
Abstract138)   HTML4)    PDF (1993KB)(83)       Save

Aiming at the problems of limited representation of spectrogram feature correlation information and unsatisfactory denoising effect in the existing speech enhancement methods, a speech enhancement method of Double Complex Convolution and Attention Aggregating Recurrent Network (DCCARN) was proposed. Firstly, a double complex convolutional network was established to encode the two-branch information of the spectrogram features after the short-time Fourier transform. Secondly, the codes in the two branches were used in the inter- and and intra-feature-block attention mechanisms respectively, and different speech feature information was re-labeled. Secondly, the long-term sequence information was processed by Long Short-Term Memory (LSTM) network, and the spectrogram features were restored and aggregated by two decoders. Finally, the target speech waveform was generated by short-time inverse Fourier transform to achieve the purpose of suppressing noise. Experiments were carried out on the public dataset VBD (Voice Bank+DMAND) and the noise added dataset TIMIT. The results show that compared with the phase-aware Deep Complex Convolution Recurrent Network (DCCRN), DCCARN has the Perceptual Evaluation of Speech Quality (PESQ) increased by 0.150 and 0.077 to 0.087 respectively. It is verified that the proposed method can capture the correlation information of spectrogram features more accurately, suppress noise more effectively, and improve speech intelligibility.

Table and Figures | Reference | Related Articles | Metrics